Two problems:
- Label noise: label flip noise (belong to other training categories) and outlier noise (does not belong to any training category).
- Domain shift: domain distribution mismatch between web data and consumer data.
Solutions:
multi-instance learning: [4] (pixel-level attention) [5] [6] [19](image-level attention)
bootstrapping: [12]
negative learning: [18]
Cyclical Training: [20]
Use auxiliary clean data:
active learning (select training samples to annotate): [13]
reinforcement learning (learn labeling policies): [14]
analogous to semi-supervised learning
Datasets:
There are two types of label noise: synthetic label noise and web label noise.
large-scale web datasets: webvision v1, webvision v2
fine-grained web datasets: clothing, car, Stanford Dogs, Food101N, MIT indoor67, skin disease-198
synthetic noisy datasets via label flipping: CIFAR-10/100
Surveys:
Reference
[1] Chen, Xinlei, and Abhinav Gupta. “Webly supervised learning of convolutional networks.” ICCV, 2015.
[2] Sukhbaatar, Sainbayar, et al. “Training convolutional networks with noisy labels.” arXiv preprint arXiv:1406.2080 (2014).
[3] Xiao, Tong, et al. “Learning from massive noisy labeled data for image classification.” CVPR, 2015.
[4] Zhuang, Bohan, et al. “Attend in groups: a weakly-supervised deep learning framework for learning from web data.” CVPR, 2017.
[5] Wu, Jiajun, et al. “Deep multiple instance learning for image classification and auto-annotation.” CVPR, 2015.
[6] Ilse, Maximilian, Jakub M. Tomczak, and Max Welling. “Attention-based deep multiple instance learning.” arXiv preprint arXiv:1802.04712 (2018).
[7] Lee, Kuang-Huei, et al. “Cleannet: Transfer learning for scalable image classifier training with label noise.” CVPR, 2018.
[8] Liu, Tongliang, and Dacheng Tao. “Classification with noisy labels by importance reweighting.” T-PAMI, 2015.
[9] Misra, Ishan, et al. “Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels.” CVPR, 2016.
[10] Guo, Sheng, et al. “Curriculumnet: Weakly supervised learning from large-scale web images.” ECCV, 2018.
[11] Jiang, Lu, et al. “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” arXiv preprint arXiv:1712.05055 (2017).
[12] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).
[13] Krause, Jonathan, et al. “The unreasonable effectiveness of noisy data for fine-grained recognition.” ECCV, 2016.
[14] Yeung, Serena, et al. “Learning to learn from noisy web videos.” CVPR, 2017.
[15] Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” CVPR, 2017.
[16] Xu, Zhe, et al. “Webly-supervised fine-grained visual categorization via deep domain adaptation.” T-PAMI, 2016.
[17] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.
[18] Kim, Youngdong, et al. “Nlnl: Negative learning for noisy labels.” ICCV, 2019.
[19] “MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition”, CVPR, 2019.
[20] Huang, Jinchi, et al. “O2u-net: A simple noisy label detection approach for deep neural networks.” ICCV, 2019.